Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Improved SMOTE unbalanced data integration classification algorithm
WANG Zhongzhen, HUANG Bo, FANG Zhijun, GAO Yongbin, ZHANG Juan
Journal of Computer Applications    2019, 39 (9): 2591-2596.   DOI: 10.11772/j.issn.1001-9081.2019030531
Abstract439)      PDF (981KB)(383)       Save

Aiming at the low classification accuracy of unbalanced datasets, an unbalanced data classification algorithm based on improved SMOTE (Synthetic Minority Oversampling TEchnique) and AdaBoost algorithm (KSMOTE-AdaBoost) was proposed. Firstly, a noise sample identification algorithm was proposed according to the idea of K-Nearest Neighbors (KNN). The noise samples in the sample set were accurately identified and filtered out by the number of heterogeneous samples included in the K neighbors of the sample. Secondly, in the process of oversampling, the sample set was divided into different sub-clusters based on the idea of clustering. According to the cluster center of the sub-cluster and the number of samples the sub-cluster contains, the synthesis of new samples was performed between the samples in the cluster and the cluster center. In the process of sample synthesis, the data imbalance between classes as well as in the class was fully considered, and the samples were corrected in time to ensure the quality of the synthesized samples and balance the sample information. Finally, using the advantage of AdaBoost algorithm, the decision tree was used as the base classifier and the balanced sample set was trained and iterated several times until the termination condition was satisfied, and the final classification model was obtained. The comparative experiments were carried out on 6 KEEL datasets with G-mean and AUC selected as evaluation indicators. The experimental results show that compared with the classical oversampling algorithm SMOTE and ADASYN (ADAptive SYNthetic sampling approach), G-means and AUC have the highest of 3 groups in 4 groups. Compared with the existing unbalanced classification models SMOTE-Boost, CUS (Cluster-based Under-Sampling)-Boost and RUS (Random Under-Sampling)-Boost, among the 6 groups of data:the proposed classification model has higher G-means than CUS-Boost and RUS-Boost, and 3 groups are lower than SMOTE-Boost; AUC is higher than SMOTE-Boost and RUS-Boost, and one group is lower than CUS-Boost. It is verified that the proposed KSMOTE-AdaBoost has better classification effect and the model has higher generalization performance.

Reference | Related Articles | Metrics
Modified K-means clustering algorithm based on good point set and Leader method
ZHANG Yan-ping ZHANG Juan HE Cheng-gang CHU Wei-cui ZHANG Li-na
Journal of Computer Applications    2011, 31 (05): 1359-1362.   DOI: 10.3724/SP.J.1087.2011.01359
Abstract1353)      PDF (743KB)(916)       Save
Traditional K-means algorithm is sensitive to the initial start center. To solve this problem, a method was proposed to optimize the initial center points through adopting the theory of good point set and Leader method. According to the different combination ways, the new algorithms were called KLG and KGL respectively. Better points could be obtained by the theory of good point set rather than random selection. The Leader method could reflect the distribution characteristics of the data object. The experimental results conducted on the UCI database show that the KLG and KGL algorithms significantly outperform the traditional and other initialization K-means algorithms.
Related Articles | Metrics